{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0b523e0a",
   "metadata": {},
   "source": [
    "# Lesson 4: Building a Multi-Document Agent"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a323703",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b9625ab2-71b6-4fd0-904e-42df80d3215f",
   "metadata": {
    "height": 47,
    "tags": []
   },
   "outputs": [],
   "source": [
    "from helper import get_openai_api_key\n",
    "OPENAI_API_KEY = get_openai_api_key()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3221a474-5817-4db2-af46-e029042a75a5",
   "metadata": {
    "height": 47,
    "tags": []
   },
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20adaa26",
   "metadata": {},
   "source": [
    "## 1. Setup an agent over 3 papers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48b71ff6",
   "metadata": {},
   "source": [
    "**Note**: The pdf files are included with this lesson. To access these papers, go to the `File` menu and select`Open...`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "ed10a24b-d65c-4b98-a93a-94ccdb8900d0",
   "metadata": {
    "height": 200,
    "tags": []
   },
   "outputs": [],
   "source": [
    "urls = [\n",
    "    \"https://openreview.net/pdf?id=VtmBAGCN7o\",\n",
    "    \"https://openreview.net/pdf?id=6PmJoRfdaK\",\n",
    "    \"https://openreview.net/pdf?id=hSyW5go0v8\",\n",
    "]\n",
    "\n",
    "papers = [\n",
    "    \"metagpt.pdf\",\n",
    "    \"longlora.pdf\",\n",
    "    \"selfrag.pdf\",\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "555749b5",
   "metadata": {
    "height": 1203
   },
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, SummaryIndex\n",
    "from llama_index.core.node_parser import SentenceSplitter\n",
    "from llama_index.core.tools import FunctionTool, QueryEngineTool\n",
    "from llama_index.core.vector_stores import MetadataFilters, FilterCondition\n",
    "from typing import List, Optional\n",
    "\n",
    "def get_doc_tools(\n",
    "    file_path: str,\n",
    "    name: str,\n",
    ") -> str:\n",
    "    \"\"\"Get vector query and summary query tools from a document.\"\"\"\n",
    "\n",
    "    # load documents\n",
    "    documents = SimpleDirectoryReader(input_files=[file_path]).load_data()\n",
    "    splitter = SentenceSplitter(chunk_size=1024)\n",
    "    nodes = splitter.get_nodes_from_documents(documents)\n",
    "    vector_index = VectorStoreIndex(nodes)\n",
    "    \n",
    "    def vector_query(\n",
    "        query: str, \n",
    "        page_numbers: Optional[List[str]] = None\n",
    "    ) -> str:\n",
    "        \"\"\"Use to answer questions over a given paper.\n",
    "    \n",
    "        Useful if you have specific questions over the paper.\n",
    "        Always leave page_numbers as None UNLESS there is a specific page you want to search for.\n",
    "    \n",
    "        Args:\n",
    "            query (str): the string query to be embedded.\n",
    "            page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE \n",
    "                if we want to perform a vector search\n",
    "                over all pages. Otherwise, filter by the set of specified pages.\n",
    "        \n",
    "        \"\"\"\n",
    "    \n",
    "        page_numbers = page_numbers or []\n",
    "        metadata_dicts = [\n",
    "            {\"key\": \"page_label\", \"value\": p} for p in page_numbers\n",
    "        ]\n",
    "        \n",
    "        query_engine = vector_index.as_query_engine(\n",
    "            similarity_top_k=2,\n",
    "            filters=MetadataFilters.from_dicts(\n",
    "                metadata_dicts,\n",
    "                condition=FilterCondition.OR\n",
    "            )\n",
    "        )\n",
    "        response = query_engine.query(query)\n",
    "        return response\n",
    "        \n",
    "    \n",
    "    vector_query_tool = FunctionTool.from_defaults(\n",
    "        name=f\"vector_tool_{name}\",\n",
    "        fn=vector_query\n",
    "    )\n",
    "    \n",
    "    summary_index = SummaryIndex(nodes)\n",
    "    summary_query_engine = summary_index.as_query_engine(\n",
    "        response_mode=\"tree_summarize\",\n",
    "        use_async=True,\n",
    "    )\n",
    "    summary_tool = QueryEngineTool.from_defaults(\n",
    "        name=f\"summary_tool_{name}\",\n",
    "        query_engine=summary_query_engine,\n",
    "        description=(\n",
    "            f\"Useful for summarization questions related to {name}\"\n",
    "        ),\n",
    "    )\n",
    "\n",
    "    return vector_query_tool, summary_tool"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0d8f3185-3221-4b00-bd38-41d36e4a3307",
   "metadata": {
    "height": 149,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Getting tools for paper: metagpt.pdf\n",
      "Getting tools for paper: longlora.pdf\n",
      "Getting tools for paper: selfrag.pdf\n"
     ]
    }
   ],
   "source": [
    "# from utils import get_doc_tools\n",
    "from pathlib import Path\n",
    "\n",
    "paper_to_tools_dict = {}\n",
    "for paper in papers:\n",
    "    print(f\"Getting tools for paper: {paper}\")\n",
    "    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)\n",
    "    paper_to_tools_dict[paper] = [vector_tool, summary_tool]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "609184c7",
   "metadata": {
    "height": 47
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'longlora.pdf': [<llama_index.core.tools.function_tool.FunctionTool object at 0x7f248966fe10>,\n",
      "                  <llama_index.core.tools.query_engine.QueryEngineTool object at 0x7f2488706450>],\n",
      " 'metagpt.pdf': [<llama_index.core.tools.function_tool.FunctionTool object at 0x7f24889933d0>,\n",
      "                 <llama_index.core.tools.query_engine.QueryEngineTool object at 0x7f2488dd3690>],\n",
      " 'selfrag.pdf': [<llama_index.core.tools.function_tool.FunctionTool object at 0x7f248880dd50>,\n",
      "                 <llama_index.core.tools.query_engine.QueryEngineTool object at 0x7f248880cf50>]}\n"
     ]
    }
   ],
   "source": [
    "from pprint import pprint\n",
    "pprint(paper_to_tools_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0e541bdd-14e1-41b6-81b5-b1bfda078d07",
   "metadata": {
    "height": 30,
    "tags": []
   },
   "outputs": [],
   "source": [
    "initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "bff58c52",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "from llama_index.llms.openai import OpenAI\n",
    "\n",
    "llm = OpenAI(model=\"gpt-3.5-turbo\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "2f2c6a9f",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(initial_tools)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "a124a438-5609-402e-8642-69d1088cb9ad",
   "metadata": {
    "height": 166,
    "tags": []
   },
   "outputs": [],
   "source": [
    "from llama_index.core.agent import FunctionCallingAgentWorker\n",
    "from llama_index.core.agent import AgentRunner\n",
    "\n",
    "agent_worker = FunctionCallingAgentWorker.from_tools(\n",
    "    initial_tools, \n",
    "    llm=llm, \n",
    "    verbose=True\n",
    ")\n",
    "agent = AgentRunner(agent_worker)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "17409d4c-05a9-4bf4-b74f-75135fa3cb6b",
   "metadata": {
    "height": 81,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Added user message to memory: Tell me about the evaluation dataset used in LongLoRA, and then tell me about the evaluation results\n",
      "=== Calling Function ===\n",
      "Calling function: vector_tool_longlora with args: {\"query\": \"evaluation dataset\"}\n",
      "=== Function Output ===\n",
      "PG19 test split\n",
      "=== Calling Function ===\n",
      "Calling function: vector_tool_longlora with args: {\"query\": \"evaluation results\"}\n",
      "=== Function Output ===\n",
      "The evaluation results show that the models achieve better perplexity with longer context sizes. Increasing the context window size leads to improved perplexity values. Additionally, the models are fine-tuned on different context lengths, such as 100k, 65536, and 32768, and achieve promising results on these extremely large settings. However, there is some perplexity degradation observed on small context sizes for the extended models, which is a known limitation of Position Interpolation.\n",
      "=== LLM Response ===\n",
      "The evaluation dataset used in LongLoRA is the PG19 test split. \n",
      "\n",
      "Regarding the evaluation results, the models in LongLoRA achieve better perplexity with longer context sizes. Increasing the context window size leads to improved perplexity values. The models are fine-tuned on different context lengths, such as 100k, 65536, and 32768, and achieve promising results on these extremely large settings. However, there is some perplexity degradation observed on small context sizes for the extended models, which is a known limitation of Position Interpolation.\n"
     ]
    }
   ],
   "source": [
    "response = agent.query(\n",
    "    \"Tell me about the evaluation dataset used in LongLoRA, \"\n",
    "    \"and then tell me about the evaluation results\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "ace340b1-761f-4058-be41-68cf131541e4",
   "metadata": {
    "height": 47,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Added user message to memory: Give me a summary of both Self-RAG and LongLoRA\n",
      "=== Calling Function ===\n",
      "Calling function: summary_tool_selfrag with args: {\"input\": \"Self-RAG\"}\n",
      "=== Function Output ===\n",
      "Self-RAG is a framework that enhances the quality and factuality of a large language model through retrieval and self-reflection. It trains a single arbitrary language model to adaptively retrieve passages on-demand, generate text informed by these passages, and reflect on both the retrieved passages and its own generations using special tokens called reflection tokens. This framework aims to improve the generation quality and factuality of the language model without compromising its original creativity and versatility.\n",
      "=== Calling Function ===\n",
      "Calling function: summary_tool_longlora with args: {\"input\": \"LongLoRA\"}\n",
      "=== Function Output ===\n",
      "LongLoRA is an efficient method for extending the context length of Large Language Models (LLMs) while minimizing computational costs. It combines shifted sparse attention (S2-Attn) with LoRA to achieve significant computation savings compared to full fine-tuning. This approach allows for extending the context window of LLMs efficiently, demonstrating strong empirical results on various tasks and models. Additionally, LongLoRA retains the original attention architecture during inference and is compatible with existing techniques like Flash-Attention2.\n",
      "=== LLM Response ===\n",
      "Self-RAG is a framework that enhances the quality and factuality of a large language model through retrieval and self-reflection. It trains a single arbitrary language model to adaptively retrieve passages on-demand, generate text informed by these passages, and reflect on both the retrieved passages and its own generations using special tokens called reflection tokens. This framework aims to improve the generation quality and factuality of the language model without compromising its original creativity and versatility.\n",
      "\n",
      "LongLoRA is an efficient method for extending the context length of Large Language Models (LLMs) while minimizing computational costs. It combines shifted sparse attention (S2-Attn) with LoRA to achieve significant computation savings compared to full fine-tuning. This approach allows for extending the context window of LLMs efficiently, demonstrating strong empirical results on various tasks and models. Additionally, LongLoRA retains the original attention architecture during inference and is compatible with existing techniques like Flash-Attention2.\n",
      "assistant: Self-RAG is a framework that enhances the quality and factuality of a large language model through retrieval and self-reflection. It trains a single arbitrary language model to adaptively retrieve passages on-demand, generate text informed by these passages, and reflect on both the retrieved passages and its own generations using special tokens called reflection tokens. This framework aims to improve the generation quality and factuality of the language model without compromising its original creativity and versatility.\n",
      "\n",
      "LongLoRA is an efficient method for extending the context length of Large Language Models (LLMs) while minimizing computational costs. It combines shifted sparse attention (S2-Attn) with LoRA to achieve significant computation savings compared to full fine-tuning. This approach allows for extending the context window of LLMs efficiently, demonstrating strong empirical results on various tasks and models. Additionally, LongLoRA retains the original attention architecture during inference and is compatible with existing techniques like Flash-Attention2.\n"
     ]
    }
   ],
   "source": [
    "response = agent.query(\"Give me a summary of both Self-RAG and LongLoRA\")\n",
    "print(str(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7eede70c",
   "metadata": {},
   "source": [
    "## 2. Setup an agent over 11 papers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18771e69",
   "metadata": {},
   "source": [
    "### Download 11 ICLR papers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "60d01d2c-547f-4054-b0fe-ed9b1a9cc3b5",
   "metadata": {
    "height": 472,
    "tags": []
   },
   "outputs": [],
   "source": [
    "urls = [\n",
    "    \"https://openreview.net/pdf?id=VtmBAGCN7o\",\n",
    "    \"https://openreview.net/pdf?id=6PmJoRfdaK\",\n",
    "    \"https://openreview.net/pdf?id=LzPWWPAdY4\",\n",
    "    \"https://openreview.net/pdf?id=VTF8yNQM66\",\n",
    "    \"https://openreview.net/pdf?id=hSyW5go0v8\",\n",
    "    \"https://openreview.net/pdf?id=9WD9KwssyT\",\n",
    "    \"https://openreview.net/pdf?id=yV6fD7LYkF\",\n",
    "    \"https://openreview.net/pdf?id=hnrB5YHoYu\",\n",
    "    \"https://openreview.net/pdf?id=WbWtOYIzIK\",\n",
    "    \"https://openreview.net/pdf?id=c5pwL0Soay\",\n",
    "    \"https://openreview.net/pdf?id=TpD2aG1h0D\"\n",
    "]\n",
    "\n",
    "papers = [\n",
    "    \"metagpt.pdf\",\n",
    "    \"longlora.pdf\",\n",
    "    \"loftq.pdf\",\n",
    "    \"swebench.pdf\",\n",
    "    \"selfrag.pdf\",\n",
    "    \"zipformer.pdf\",\n",
    "    \"values.pdf\",\n",
    "    \"finetune_fair_diffusion.pdf\",\n",
    "    \"knowledge_card.pdf\",\n",
    "    \"metra.pdf\",\n",
    "    \"vr_mcl.pdf\"\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b77426cb",
   "metadata": {
    "tags": []
   },
   "source": [
    "To download these papers, below is the needed code:\n",
    "\n",
    "\n",
    "    #for url, paper in zip(urls, papers):\n",
    "         #!wget \"{url}\" -O \"{paper}\"\n",
    "    \n",
    "    \n",
    "**Note**: The pdf files are included with this lesson. To access these papers, go to the `File` menu and select`Open...`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "ea5ee34d-02ac-4537-ae20-7ef6c5767172",
   "metadata": {
    "height": 149,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Getting tools for paper: metagpt.pdf\n",
      "Getting tools for paper: longlora.pdf\n",
      "Getting tools for paper: loftq.pdf\n",
      "Getting tools for paper: swebench.pdf\n",
      "Getting tools for paper: selfrag.pdf\n",
      "Getting tools for paper: zipformer.pdf\n",
      "Getting tools for paper: values.pdf\n",
      "Getting tools for paper: finetune_fair_diffusion.pdf\n",
      "Getting tools for paper: knowledge_card.pdf\n",
      "Getting tools for paper: metra.pdf\n",
      "Getting tools for paper: vr_mcl.pdf\n"
     ]
    }
   ],
   "source": [
    "from utils import get_doc_tools\n",
    "from pathlib import Path\n",
    "\n",
    "paper_to_tools_dict = {}\n",
    "for paper in papers:\n",
    "    print(f\"Getting tools for paper: {paper}\")\n",
    "    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)\n",
    "    paper_to_tools_dict[paper] = [vector_tool, summary_tool]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e35d52c",
   "metadata": {},
   "source": [
    "### Extend the Agent with Tool Retrieval"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "20154923-873e-4941-9a3a-4926ab5f9b8c",
   "metadata": {
    "height": 30,
    "tags": []
   },
   "outputs": [],
   "source": [
    "all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "671582f9-70d7-4a8f-b813-58b2a068ca72",
   "metadata": {
    "height": 149,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# define an \"object\" index and retriever over these tools\n",
    "from llama_index.core import VectorStoreIndex\n",
    "from llama_index.core.objects import ObjectIndex\n",
    "\n",
    "obj_index = ObjectIndex.from_objects(\n",
    "    all_tools,\n",
    "    index_cls=VectorStoreIndex,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "c3929882-e9dc-46ca-b495-53e3ed60340e",
   "metadata": {
    "height": 30,
    "tags": []
   },
   "outputs": [],
   "source": [
    "obj_retriever = obj_index.as_retriever(similarity_top_k=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "ba9cfecd-fe14-4da8-b9ba-b3d485d98a03",
   "metadata": {
    "height": 64,
    "tags": []
   },
   "outputs": [],
   "source": [
    "tools = obj_retriever.retrieve(\n",
    "    \"Tell me about the eval dataset used in MetaGPT and SWE-Bench\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "b8802845",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<llama_index.core.tools.query_engine.QueryEngineTool at 0x7f2487e15b10>,\n",
       " <llama_index.core.tools.query_engine.QueryEngineTool at 0x7f2484466bd0>,\n",
       " <llama_index.core.tools.query_engine.QueryEngineTool at 0x7f2488786810>]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "c270ffbf-69c7-48ea-a028-9ba25221cde5",
   "metadata": {
    "height": 30,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ToolMetadata(description='Useful for summarization questions related to '\n",
      "                          'metagpt',\n",
      "              name='summary_tool_metagpt',\n",
      "              fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>,\n",
      "              return_direct=False),\n",
      " ToolMetadata(description='Useful for summarization questions related to metra',\n",
      "              name='summary_tool_metra',\n",
      "              fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>,\n",
      "              return_direct=False),\n",
      " ToolMetadata(description='Useful for summarization questions related to '\n",
      "                          'swebench',\n",
      "              name='summary_tool_swebench',\n",
      "              fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>,\n",
      "              return_direct=False)]\n"
     ]
    }
   ],
   "source": [
    "pprint([tools[i].metadata for i in range(len(tools))])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "9cc0a0b6-9858-4348-9ae0-1cd4160f3fb7",
   "metadata": {
    "height": 251,
    "tags": []
   },
   "outputs": [],
   "source": [
    "from llama_index.core.agent import FunctionCallingAgentWorker\n",
    "from llama_index.core.agent import AgentRunner\n",
    "\n",
    "agent_worker = FunctionCallingAgentWorker.from_tools(\n",
    "    tool_retriever=obj_retriever,\n",
    "    llm=llm, \n",
    "    system_prompt=\"\"\" \\\n",
    "You are an agent designed to answer queries over a set of given papers.\n",
    "Please always use the tools provided to answer a question. Do not rely on prior knowledge.\\\n",
    "\n",
    "\"\"\",\n",
    "    verbose=True\n",
    ")\n",
    "agent = AgentRunner(agent_worker)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "a250cf1a-e011-4994-bcca-4e0294f20864",
   "metadata": {
    "height": 98,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Added user message to memory: Tell me about the evaluation dataset used in MetaGPT and compare it against SWE-Bench\n",
      "=== Calling Function ===\n",
      "Calling function: summary_tool_metagpt with args: {\"input\": \"evaluation dataset used in MetaGPT\"}\n",
      "=== Function Output ===\n",
      "The evaluation dataset used in MetaGPT includes HumanEval, MBPP, and SoftwareDev.\n",
      "=== Calling Function ===\n",
      "Calling function: summary_tool_swebench with args: {\"input\": \"evaluation dataset used in SWE-Bench\"}\n",
      "=== Function Output ===\n",
      "The evaluation dataset used in SWE-Bench consists of task instances collected from real GitHub issues and corresponding pull requests across 12 popular Python repositories. It includes task instructions, issue text, retrieved files and documentation, an example patch file, and a prompt for generating the patch file. The dataset aims to present a challenging and realistic arena for evaluating language models in the context of software engineering tasks.\n",
      "=== LLM Response ===\n",
      "The evaluation dataset used in MetaGPT includes HumanEval, MBPP, and SoftwareDev. \n",
      "\n",
      "On the other hand, the evaluation dataset used in SWE-Bench consists of task instances collected from real GitHub issues and corresponding pull requests across 12 popular Python repositories. It includes task instructions, issue text, retrieved files and documentation, an example patch file, and a prompt for generating the patch file. The dataset aims to present a challenging and realistic arena for evaluating language models in the context of software engineering tasks.\n",
      "assistant: The evaluation dataset used in MetaGPT includes HumanEval, MBPP, and SoftwareDev. \n",
      "\n",
      "On the other hand, the evaluation dataset used in SWE-Bench consists of task instances collected from real GitHub issues and corresponding pull requests across 12 popular Python repositories. It includes task instructions, issue text, retrieved files and documentation, an example patch file, and a prompt for generating the patch file. The dataset aims to present a challenging and realistic arena for evaluating language models in the context of software engineering tasks.\n"
     ]
    }
   ],
   "source": [
    "response = agent.query(\n",
    "    \"Tell me about the evaluation dataset used \"\n",
    "    \"in MetaGPT and compare it against SWE-Bench\"\n",
    ")\n",
    "print(str(response))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "8084c8cb-98ed-4835-aaa4-5b0c7254be6d",
   "metadata": {
    "height": 81,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Added user message to memory: Compare and contrast the LoRA papers (LongLoRA, LoftQ). Analyze the approach in each paper first. \n",
      "=== Calling Function ===\n",
      "Calling function: summary_tool_longlora with args: {\"input\": \"LongLoRA\"}\n",
      "=== Function Output ===\n",
      "LongLoRA is an efficient method for extending the context length of Large Language Models (LLMs) while minimizing computational costs. It combines shifted sparse attention (S2-Attn) with LoRA to efficiently increase the context window of pre-trained models. This approach aims to save trainable parameters and memory costs compared to full fine-tuning, making it a popular choice for adapting LLMs to different datasets. Additionally, LongLoRA bridges the performance gap between LoRA and full fine-tuning by allowing the training of normalization and embedding layers, crucial for adapting LLMs to longer contexts.\n",
      "=== Calling Function ===\n",
      "Calling function: summary_tool_loftq with args: {\"input\": \"LoftQ\"}\n",
      "=== Function Output ===\n",
      "LoftQ is a quantization framework that integrates low-rank approximation with quantization to jointly approximate the original high-precision pre-trained weights. It aims to provide an efficient initialization point for subsequent Low-Rank Adaptation (LoRA) fine-tuning, leading to enhanced performance in downstream tasks. LoftQ demonstrates improved performance and stability compared to other quantization methods, particularly excelling in low-bit quantization scenarios.\n",
      "=== LLM Response ===\n",
      "LongLoRA is an efficient method for extending the context length of Large Language Models (LLMs) while minimizing computational costs. It combines shifted sparse attention (S2-Attn) with LoRA to efficiently increase the context window of pre-trained models. This approach aims to save trainable parameters and memory costs compared to full fine-tuning, making it a popular choice for adapting LLMs to different datasets. Additionally, LongLoRA bridges the performance gap between LoRA and full fine-tuning by allowing the training of normalization and embedding layers, crucial for adapting LLMs to longer contexts.\n",
      "\n",
      "On the other hand, LoftQ is a quantization framework that integrates low-rank approximation with quantization to jointly approximate the original high-precision pre-trained weights. It aims to provide an efficient initialization point for subsequent Low-Rank Adaptation (LoRA) fine-tuning, leading to enhanced performance in downstream tasks. LoftQ demonstrates improved performance and stability compared to other quantization methods, particularly excelling in low-bit quantization scenarios.\n"
     ]
    }
   ],
   "source": [
    "response = agent.query(\n",
    "    \"Compare and contrast the LoRA papers (LongLoRA, LoftQ). \"\n",
    "    \"Analyze the approach in each paper first. \"\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
